本文介绍了使用statsmodels AutoReg模型的自回归建模。它还涵盖了ar_selectorder在选择最小化信息标准(如AIC)的模型方面的帮助。一个自回归模型形式如下 $$y{t}=\delta+\phi{1} y{t-1}+\ldots+\phi{p} y{t-p}+\epsilon_{t}$$

AutoReg具有如下参数:

  • 确定性趋势Deterministic terms (trend)
    • n: 无确定趋势No deterministic term
    • c: 常数Constant (default)
    • ct:线性趋势 Constant and time trend
    • t: 只有时间趋势 Time trend only
  • 季节性哑变量Seasonal dummies (seasonal)
    • True 表示的是含季节性
  • 特定确定性趋势Custom deterministic terms (deterministic)
    • 接受一个确定性的过程Accepts a Deterministic Process
  • 外生变量Exogenous variables (exog)
    • 接受外生变量组成的DataFrame或array。A DataFrame or array of exogenous variables to include in the model
  • 忽略特定迟滞期Omission of selected lags (lags)

它的完整的形式为: $$ y_{t}=\delta_{0}+\delta_{1} t+\phi_{1} y_{t-1}+\ldots+\phi_{p} y_{t-p}+\sum_{i=1}^{s-1} \gamma_{i} d_{i}+\sum_{j=1}^{m} \kappa_{j} x_{t, j}+\epsilon_{t} $$

其中:

  • $d_i$表示季节哑变量当$mod(t,period)=i$时$d_i=1$
  • $t$是时间,从1开始计算
  • $x_{i,j}$是外生回归因素
  • $\epsilon_t$是白噪声
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import pandas_datareader as pdr
import seaborn as sns
from statsmodels.tsa.api import acf, graphics, pacf
from statsmodels.tsa.ar_model import AutoReg, ar_select_order

第一组例子使用的是未经过季节性调整的美国房屋开工环比增长率。季节性是明显的,有规律的波峰和波谷。我们将时间序列的频率设置为“MS”(month-start),以避免使用AutoReg时出现警告。

data = pdr.get_data_fred("HOUSTNSA", "1959-01-01", "2019-06-01")
housing = data.HOUSTNSA.pct_change().dropna()
# Scale by 100 to get percentages
housing = 100 * housing.asfreq("MS")
fig, ax = plt.subplots()
ax = housing.plot(ax=ax)

取AR(3)进行建模

mod = AutoReg(housing, 3, old_names=False)
res = mod.fit()
print(res.summary())

AutoReg支持与OLS相同的协方差估计。下面,我们使用cov_type=“HC0”,这是White的协方差估计。虽然参数估计值是相同的,但所有依赖于标准误差的量都在变化。

res = mod.fit(cov_type="HC0")
print(res.summary())

plot_predict可视化预测。在这里,我们提供了大量的预测,这些预测显示了该模型捕捉到的一连串季节性。

fig = res.plot_predict(720, 840)

plot_diagnostics表示该模型捕获了数据中的关键特性。

fig = plt.figure(figsize=(16, 9))
fig = res.plot_diagnostics(fig=fig, lags=30)

季节性哑变量¶

AutoReg支持季节性模型,这是模拟季节性的另一种方法。含这一项将动态缩短为仅AR(2)。

sel = ar_select_order(housing, 13, seasonal=True, old_names=False)
sel.ar_lags
res = sel.model.fit()
print(res.summary())

在未来10年的预测中,季节性哑变量是明显的,它在未来10年的所有时期都具有重要的季节性成分。

fig = res.plot_predict(720, 840)

季节变动¶

虽然AutoReg不直接支持季节成分,因为它使用OLS来估计参数,但它可以使用过度参数化的季节AR来捕获季节动态,这不会在季节AR中施加限制。

我们首先使用只选择最大延迟的简单方法来选择一个模型。检查的最大延迟设置为13,因为这允许模型在一个既有短期AR(1)组件又有一个季节性AR(1)组件的季节性AR下进行检查,因此 $$ \left(1-\phi_{s} L^{12}\right)\left(1-\phi_{1} L\right) y_{t}=\epsilon_{t} $$ 变成 $$ y_{t}=\phi_{1} y_{t-1}+\phi_{s} Y_{t-12}-\phi_{1} \phi_{s} Y_{t-13}+\epsilon_{t} $$


In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import pandas_datareader as pdr
import seaborn as sns
from statsmodels.tsa.api import acf, graphics, pacf
from statsmodels.tsa.ar_model import AutoReg, ar_select_order
In [2]:
sns.set_style("darkgrid")
pd.plotting.register_matplotlib_converters()
# Default figure size
sns.mpl.rc("figure", figsize=(16, 6))
sns.mpl.rc("font", size=14)
In [3]:
data = pdr.get_data_fred("HOUSTNSA", "1959-01-01", "2019-06-01")
housing = data.HOUSTNSA.pct_change().dropna()
# Scale by 100 to get percentages
housing = 100 * housing.asfreq("MS")
fig, ax = plt.subplots()
ax = housing.plot(ax=ax)
In [4]:
mod = AutoReg(housing, 3, old_names=False)
res = mod.fit()
print(res.summary())
                            AutoReg Model Results                             
==============================================================================
Dep. Variable:               HOUSTNSA   No. Observations:                  725
Model:                     AutoReg(3)   Log Likelihood               -2993.442
Method:               Conditional MLE   S.D. of innovations             15.289
Date:                Fri, 12 Aug 2022   AIC                           5996.884
Time:                        14:29:21   BIC                           6019.794
Sample:                    05-01-1959   HQIC                          6005.727
                         - 06-01-2019                                         
===============================================================================
                  coef    std err          z      P>|z|      [0.025      0.975]
-------------------------------------------------------------------------------
const           1.1228      0.573      1.961      0.050       0.000       2.245
HOUSTNSA.L1     0.1910      0.036      5.235      0.000       0.120       0.263
HOUSTNSA.L2     0.0058      0.037      0.155      0.877      -0.067       0.079
HOUSTNSA.L3    -0.1939      0.036     -5.319      0.000      -0.265      -0.122
                                    Roots                                    
=============================================================================
                  Real          Imaginary           Modulus         Frequency
-----------------------------------------------------------------------------
AR.1            0.9680           -1.3298j            1.6448           -0.1499
AR.2            0.9680           +1.3298j            1.6448            0.1499
AR.3           -1.9064           -0.0000j            1.9064           -0.5000
-----------------------------------------------------------------------------
In [5]:
res = mod.fit(cov_type="HC0")
print(res.summary())
                            AutoReg Model Results                             
==============================================================================
Dep. Variable:               HOUSTNSA   No. Observations:                  725
Model:                     AutoReg(3)   Log Likelihood               -2993.442
Method:               Conditional MLE   S.D. of innovations             15.289
Date:                Fri, 12 Aug 2022   AIC                           5996.884
Time:                        14:29:21   BIC                           6019.794
Sample:                    05-01-1959   HQIC                          6005.727
                         - 06-01-2019                                         
===============================================================================
                  coef    std err          z      P>|z|      [0.025      0.975]
-------------------------------------------------------------------------------
const           1.1228      0.601      1.869      0.062      -0.055       2.300
HOUSTNSA.L1     0.1910      0.035      5.499      0.000       0.123       0.259
HOUSTNSA.L2     0.0058      0.039      0.150      0.881      -0.070       0.081
HOUSTNSA.L3    -0.1939      0.036     -5.448      0.000      -0.264      -0.124
                                    Roots                                    
=============================================================================
                  Real          Imaginary           Modulus         Frequency
-----------------------------------------------------------------------------
AR.1            0.9680           -1.3298j            1.6448           -0.1499
AR.2            0.9680           +1.3298j            1.6448            0.1499
AR.3           -1.9064           -0.0000j            1.9064           -0.5000
-----------------------------------------------------------------------------
In [6]:
sel = ar_select_order(housing, 13, old_names=False)
sel.ar_lags
res = sel.model.fit()
print(res.summary())
                            AutoReg Model Results                             
==============================================================================
Dep. Variable:               HOUSTNSA   No. Observations:                  725
Model:                    AutoReg(13)   Log Likelihood               -2676.157
Method:               Conditional MLE   S.D. of innovations             10.378
Date:                Fri, 12 Aug 2022   AIC                           5382.314
Time:                        14:29:21   BIC                           5450.835
Sample:                    03-01-1960   HQIC                          5408.781
                         - 06-01-2019                                         
================================================================================
                   coef    std err          z      P>|z|      [0.025      0.975]
--------------------------------------------------------------------------------
const            1.3615      0.458      2.970      0.003       0.463       2.260
HOUSTNSA.L1     -0.2900      0.036     -8.161      0.000      -0.360      -0.220
HOUSTNSA.L2     -0.0828      0.031     -2.652      0.008      -0.144      -0.022
HOUSTNSA.L3     -0.0654      0.031     -2.106      0.035      -0.126      -0.005
HOUSTNSA.L4     -0.1596      0.031     -5.166      0.000      -0.220      -0.099
HOUSTNSA.L5     -0.0434      0.031     -1.387      0.165      -0.105       0.018
HOUSTNSA.L6     -0.0884      0.031     -2.867      0.004      -0.149      -0.028
HOUSTNSA.L7     -0.0556      0.031     -1.797      0.072      -0.116       0.005
HOUSTNSA.L8     -0.1482      0.031     -4.803      0.000      -0.209      -0.088
HOUSTNSA.L9     -0.0926      0.031     -2.960      0.003      -0.154      -0.031
HOUSTNSA.L10    -0.1133      0.031     -3.665      0.000      -0.174      -0.053
HOUSTNSA.L11     0.1151      0.031      3.699      0.000       0.054       0.176
HOUSTNSA.L12     0.5352      0.031     17.133      0.000       0.474       0.596
HOUSTNSA.L13     0.3178      0.036      8.937      0.000       0.248       0.388
                                    Roots                                     
==============================================================================
                   Real          Imaginary           Modulus         Frequency
------------------------------------------------------------------------------
AR.1             1.0913           -0.0000j            1.0913           -0.0000
AR.2             0.8743           -0.5018j            1.0080           -0.0829
AR.3             0.8743           +0.5018j            1.0080            0.0829
AR.4             0.5041           -0.8765j            1.0111           -0.1669
AR.5             0.5041           +0.8765j            1.0111            0.1669
AR.6             0.0056           -1.0530j            1.0530           -0.2491
AR.7             0.0056           +1.0530j            1.0530            0.2491
AR.8            -0.5263           -0.9335j            1.0716           -0.3317
AR.9            -0.5263           +0.9335j            1.0716            0.3317
AR.10           -0.9525           -0.5880j            1.1194           -0.4120
AR.11           -0.9525           +0.5880j            1.1194            0.4120
AR.12           -1.2928           -0.2608j            1.3189           -0.4683
AR.13           -1.2928           +0.2608j            1.3189            0.4683
------------------------------------------------------------------------------
In [7]:
fig = res.plot_predict(720, 840)
In [8]:
fig = plt.figure(figsize=(16, 9))
fig = res.plot_diagnostics(fig=fig, lags=30)
In [9]:
sel = ar_select_order(housing, 13, seasonal=True, old_names=False)
sel.ar_lags
res = sel.model.fit()
print(res.summary())
                            AutoReg Model Results                             
==============================================================================
Dep. Variable:               HOUSTNSA   No. Observations:                  725
Model:               Seas. AutoReg(2)   Log Likelihood               -2652.556
Method:               Conditional MLE   S.D. of innovations              9.487
Date:                Fri, 12 Aug 2022   AIC                           5335.112
Time:                        14:29:22   BIC                           5403.863
Sample:                    04-01-1959   HQIC                          5361.648
                         - 06-01-2019                                         
===============================================================================
                  coef    std err          z      P>|z|      [0.025      0.975]
-------------------------------------------------------------------------------
const           1.2726      1.373      0.927      0.354      -1.418       3.963
s(2,12)        32.6477      1.824     17.901      0.000      29.073      36.222
s(3,12)        23.0685      2.435      9.472      0.000      18.295      27.842
s(4,12)        10.7267      2.693      3.983      0.000       5.449      16.005
s(5,12)         1.6792      2.100      0.799      0.424      -2.437       5.796
s(6,12)        -4.4229      1.896     -2.333      0.020      -8.138      -0.707
s(7,12)        -4.2113      1.824     -2.309      0.021      -7.786      -0.636
s(8,12)        -6.4124      1.791     -3.581      0.000      -9.922      -2.902
s(9,12)         0.1095      1.800      0.061      0.952      -3.419       3.638
s(10,12)      -16.7511      1.814     -9.234      0.000     -20.307     -13.196
s(11,12)      -20.7023      1.862    -11.117      0.000     -24.352     -17.053
s(12,12)      -11.9554      1.778     -6.724      0.000     -15.440      -8.470
HOUSTNSA.L1    -0.2953      0.037     -7.994      0.000      -0.368      -0.223
HOUSTNSA.L2    -0.1148      0.037     -3.107      0.002      -0.187      -0.042
                                    Roots                                    
=============================================================================
                  Real          Imaginary           Modulus         Frequency
-----------------------------------------------------------------------------
AR.1           -1.2862           -2.6564j            2.9514           -0.3218
AR.2           -1.2862           +2.6564j            2.9514            0.3218
-----------------------------------------------------------------------------
In [10]:
fig = res.plot_predict(720, 840)
In [11]:
fig = plt.figure(figsize=(16, 9))
fig = res.plot_diagnostics(lags=30, fig=fig)
In [12]:
yoy_housing = data.HOUSTNSA.pct_change(12).resample("MS").last().dropna()
_, ax = plt.subplots()
ax = yoy_housing.plot(ax=ax)